An Architecture for Information Extraction from Figures in Digital Libraries
نویسندگان
چکیده
Scholarly documents contain multiple figures representing experimental findings. These figures are generated from data which is not reported anywhere else in the paper. We propose a modular architecture for analyzing such figures. Our architecture consists of the following modules: 1. An extractor for figures and associated metadata (figure captions and mentions) from PDF documents; 2. A Search engine on the extracted figures and metadata; 3. An image processing module for automated data extraction from the figures and 4. A natural language processing module to understand the semantics of the figure. We discuss the challenges in each step, report an extractor algorithm to extract vector graphics from scholarly documents and a classification algorithm for figures. Our extractor algorithm improves the state of the art by more than 10% and the classification process is very scalable, yet achieves 85% accuracy. We also describe a semi-automatic system for data extraction from figures which is integrated with our search engine to improve user experience.
منابع مشابه
A Systematic Review of Data Mining Applications in Digital Libraries
Purpose: Study aimed to identify the applications of data mining in the provision of services, collection and management of digital libraries. Methodology: This is an applied study in terms of purpose and in terms of method is qualitative research that have been done by systematic review method. For this purpose, articles have been obtained by searching databases of Springer, Emerald, ProQuest,...
متن کاملProposed content framework for digital literacy education to users in Iran
Aim: today, digital literacy, as a set of skills that enable people to use digital space effectively for success in personal, educational and professional life, has become a necessity in all societies and public libraries are one of the most important providers of digital literacy education in the world. Digital literacy education has not been considered in public libraries in Iran. The first s...
متن کاملA symbol-based fuzzy decision-making approach to evaluate the user satisfaction on services in academic digital libraries
Academic libraries play a significant role in providing core services that include research, teaching and learning. Usersatisfaction is an important indicator for evaluating the performance of library service. This paper develops a methodfor measuring the user satisfaction in a group decision-making environment. First, the performance of service isevaluated by using questionnaire survey. The sc...
متن کاملDesign and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)
Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...
متن کاملبررسی کاربرد فناوری معنایی برای سازماندهی اطلاعات در نرمافزارهای کتابخانه دیجیتالی
The present study was an attempt to investigate the use of semantic technologies to organize information in digital library software systems. The present study was a practical one which employed a descriptive survey method. The study sample consisted of three digital library software systems entitled Pars Azarakhsh, Parvan Pajoh, and Payam Mashregh. Data were collected through a checklist incl...
متن کامل